AITopics | relative pose estimation

Collaborating Authors

relative pose estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Point-PNG: Conditional Pseudo-Negatives Generation for Point Cloud Pre-Training

Mahendren, Sutharsan, Rahman, Saimunur, Koniusz, Piotr, Fernando, Tharindu, Sridharan, Sridha, Fookes, Clinton, Moghadam, Peyman

arXiv.org Artificial IntelligenceDec-8-2025

We propose Point-PNG, a novel self-supervised learning framework that generates conditional pseudo-negatives in the latent space to learn point cloud representations that are both discriminative and transformation-sensitive. Conventional self-supervised learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they often suffer from an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations with limited variation across transformations. To address this, we propose Point-PNG that explicitly penalizes invariant collapse through pseudo-negatives generation, enabling the network to capture richer transformation cues while preserving discriminative representations. To this end, we introduce a parametric network, COnditional Pseudo-Negatives Embedding (COPE), which learns localized displacements induced by transformations within the latent space. A key challenge arises when jointly training COPE with the MAE, as it tends to converge to trivial identity mappings. To overcome this, we design a loss function based on pseudo-negatives conditioned on the transformation, which penalizes such trivial invariant solutions and enforces meaningful representation learning. We validate Point-PNG on shape classification and relative pose estimation tasks, showing competitive performance on ModelNet40 and ScanObjectNN under challenging evaluation protocols, and achieving superior accuracy in relative pose estimation compared to supervised baselines.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2409.15832

Country:

Europe (0.93)
Oceania > Australia > Queensland (0.28)

Genre:

Research Report (0.64)
Personal (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Semi-distributed Cross-modal Air-Ground Relative Localization

Lu, Weining, Bin, Deer, Ma, Lian, Ma, Ming, Ma, Zhihao, Chen, Xiangyang, Wang, Longfei, Feng, Yixiao, Jiang, Zhouxian, Shi, Yongliang, Liang, Bin

arXiv.org Artificial IntelligenceNov-11-2025

Efficient, accurate, and flexible relative localization is crucial in air-ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in the form of distributed multi-robot SLAM systems with the same sensor configuration, which are tightly coupled with the state estimation of all robots, limiting both flexibility and accuracy. To this end, we fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors, enabling a semi-distributed cross-modal air-ground relative localization framework. In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning-based keypoints and global descriptors, which decouples the relative localization from the state estimation of all agents. The UGV employs a local Bundle Adjustment (BA) with LiDAR, camera, and an IMU to rapidly obtain accurate relative pose estimates. The BA process adopts sparse keypoint optimization and is divided into two stages: First, optimizing camera poses interpolated from LiDAR-Inertial Odometry (LIO), followed by estimating the relative camera poses between the UGV and UAV. Additionally, we implement an incremental loop closure detection algorithm using deep learning-based descriptors to maintain and retrieve keyframes efficiently. Experimental results demonstrate that our method achieves outstanding performance in both accuracy and efficiency. Unlike traditional multi-robot SLAM approaches that transmit images or point clouds, our method only transmits keypoint pixels and their descriptors, effectively constraining the communication bandwidth under 0.3 Mbps. Codes and data will be publicly available on https://github.com/Ascbpiac/cross-model-relative-localization.git.

artificial intelligence, descriptor, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.06749

Genre: Research Report (0.70)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs

Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou

Neural Information Processing SystemsOct-2-2025, 12:31:39 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, descriptor, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images

Käs, Stephanie, Peter, Sven, Thillmann, Henrik, Burenko, Anton, Adrian, David Benjamin, Mack, Dennis, Linder, Timm, Leibe, Bastian

arXiv.org Artificial IntelligenceJun-25-2025

Fisheye cameras offer robots the ability to capture human movements across a wider field of view (FOV) than standard pinhole cameras, making them particularly useful for applications in human-robot interaction and automotive contexts. However, accurately detecting human poses in fisheye images is challenging due to the curved distortions inherent to fisheye optics. While various methods for undistorting fisheye images have been proposed, their effectiveness and limitations for poses that cover a wide FOV has not been systematically evaluated in the context of absolute human pose estimation from monocular fisheye images. To address this gap, we evaluate the impact of pinhole, equidistant and double sphere camera models, as well as cylindrical projection methods, on 3D human pose estimation accuracy. We find that in close-up scenarios, pinhole projection is inadequate, and the optimal projection method varies with the FOV covered by the human pose. The usage of advanced fisheye models like the double sphere model significantly enhances 3D human pose estimation accuracy. We propose a heuristic for selecting the appropriate projection model based on the detection bounding box to enhance prediction quality. Additionally, we introduce and evaluate on our novel dataset FISHnCHIPS, which features 3D human skeleton annotations in fisheye images, including images from unconventional angles, such as extreme close-ups, ground-mounted cameras, and wide-FOV poses, available at: https://www.vision.rwth-aachen.de/fishnchips

artificial intelligence, pose estimation, video understanding, (16 more...)

arXiv.org Artificial Intelligence

2506.19747

Country: Europe > Germany (0.04)

Genre: Research Report (0.82)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)

Add feedback

REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

Shao, Liangjing, Chen, Benshuang, Zhao, Shuting, Chen, Xinrong

arXiv.org Artificial IntelligenceFeb-2-2025

Real-time ego-motion tracking for endoscope is a significant task for efficient navigation and robotic automation of endoscopy. In this paper, a novel framework is proposed to perform real-time ego-motion tracking for endoscope. Firstly, a multi-modal visual feature learning network is proposed to perform relative pose prediction, in which the motion feature from the optical flow, the scene features and the joint feature from two adjacent observations are all extracted for prediction. Due to more correlation information in the channel dimension of the concatenated image, a novel feature extractor is designed based on an attention mechanism to integrate multi-dimensional information from the concatenation of two continuous frames. To extract more complete feature representation from the fused features, a novel pose decoder is proposed to predict the pose transformation from the concatenated feature map at the end of the framework. At last, the absolute pose of endoscope is calculated based on relative poses. The experiment is conducted on three datasets of various endoscopic scenes and the results demonstrate that the proposed method outperforms state-of-the-art methods. Besides, the inference speed of the proposed method is over 30 frames per second, which meets the real-time requirement. The project page is here: remote-bmxs.netlify.app

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.18124

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Greece (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.90)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles

Fang, Susu, Li, Hao

arXiv.org Artificial IntelligenceNov-26-2024

The SLAMMOT, i.e. simultaneous localization, mapping, and moving object (detection and) tracking, represents an emerging technology for autonomous vehicles in dynamic environments. Such single-vehicle systems still have inherent limitations, such as occlusion issues. Inspired by SLAMMOT and rapidly evolving cooperative technologies, it is natural to explore cooperative simultaneous localization, mapping, moving object (detection and) tracking (C-SLAMMOT) to enhance state estimation for ego-vehicles and moving objects. C-SLAMMOT could significantly upgrade the single-vehicle performance by utilizing and integrating the shared information through communication among the multiple vehicles. This inevitably leads to a fundamental trade-off between performance and communication cost, especially in a scalable manner as the number of collaboration vehicles increases. To address this challenge, we propose a LiDAR-based communication-efficient C-SLAMMOT (CE C-SLAMMOT) method by determining the number of collaboration vehicles. In CE C-SLAMMOT, we adopt descriptor-based methods for enhancing ego-vehicle pose estimation and spatial confidence map-based methods for cooperative object perception, allowing for the continuous and dynamic selection of the corresponding critical collaboration vehicles and interaction content. This approach avoids the waste of precious communication costs by preventing the sharing of information from certain collaborative vehicles that may contribute little or no performance gain, compared to the baseline method of exchanging raw observation information among all vehicles. Comparative experiments in various aspects have confirmed that the proposed method achieves a good trade-off between performance and communication costs, while also outperforms previous state-of-the-art methods in cooperative perception performance.

collaboration vehicle, estimation, vehicle, (15 more...)

arXiv.org Artificial Intelligence

2411.17432

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > California > Los Angeles County > Culver City (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Multi-LED Classification as Pretext For Robot Heading Estimation

Carlotti, Nicholas, Nava, Mirko, Giusti, Alessandro

arXiv.org Artificial IntelligenceOct-6-2024

Our chosen model is a Vision-based relative pose estimation of peer robots is a Fully Convolutional Network (FCN) [8] that, given an image, key capability for many multi-robot applications [1]. While predicts two-dimensional maps: one for the state of each some approaches focused on handcrafted algorithms, based LED, one representing the model's belief about the robot on circles printed on paper sheets [2] or exploiting the image-space position, and one about its heading. We take LEDs geometry [3], these systems lacked adaptability to full advantage of the inductive bias of FCNs, in which the different environmental conditions. Recent solutions train value of a pixel in the output map depends on a limited deep neural networks (DNNs) on large amounts of data.

estimation, led, robot, (12 more...)

arXiv.org Artificial Intelligence

2410.04536

Country:

Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Rotterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Trajectory Regularization Enhances Self-Supervised Geometric Representation

Wang, Jiayun, Yu, Stella X., Chen, Yubei

arXiv.org Artificial IntelligenceMar-22-2024

Self-supervised learning (SSL) has proven effective in learning high-quality representations for various downstream tasks, with a primary focus on semantic tasks. However, its application in geometric tasks remains underexplored, partially due to the absence of a standardized evaluation method for geometric representations. To address this gap, we introduce a new pose-estimation benchmark for assessing SSL geometric representations, which demands training without semantic or pose labels and achieving proficiency in both semantic and geometric downstream tasks. On this benchmark, we study enhancing SSL geometric representations without sacrificing semantic classification accuracy. We find that leveraging mid-layer representations improves pose-estimation performance by 10-20%. Further, we introduce an unsupervised trajectory-regularization loss, which improves performance by an additional 4% and improves generalization ability on out-of-distribution data. We hope the proposed benchmark and methods offer new insights and improvements in self-supervised geometric representation learning.

estimation, pose estimation, representation, (15 more...)

arXiv.org Artificial Intelligence

2403.14973

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Finland (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities

Ahmed, Syed S., Shalaby, Mohammed A., Cossette, Charles C., Ny, Jerome Le, Forbes, James R.

arXiv.org Artificial IntelligenceFeb-13-2024

Multi-robot systems must have the ability to accurately estimate relative states between robots in order to perform collaborative tasks, possibly with no external aiding. Three-dimensional relative pose estimation using range measurements oftentimes suffers from a finite number of non-unique solutions, or ambiguities. This paper: 1) identifies and accurately estimates all possible ambiguities in 2D; 2) treats them as components of a Gaussian mixture model; and 3) presents a computationally-efficient estimator, in the form of a Gaussian-sum filter (GSF), to realize range-based relative pose estimation in an infrastructure-free, 3D, setup. This estimator is evaluated in simulation and experiment and is shown to avoid divergence to local minima induced by the ambiguous poses. Furthermore, the proposed GSF outperforms an extended Kalman filter, demonstrates similar performance to the computationally-demanding particle filter, and is shown to be consistent.

gsf, range measurement, robot, (15 more...)

arXiv.org Artificial Intelligence

2402.08566

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
(10 more...)

Genre: Research Report (0.40)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

Shin, Kihoon, Sim, Hyunjae, Nam, Seungwon, Kim, Yonghee, Hu, Jae, Kim, Kwang-Ki K.

arXiv.org Artificial IntelligenceFeb-4-2024

In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose estimation can be achieved through range-only or bearing-only measurements, provided both robots have non-zero linear velocities. In cases where odometry data from a target robot are not directly transmitted but estimated by the ego robot, both range and bearing measurements are necessary to ensure observability of relative pose estimation. For ROS/Gazebo simulations, we explore four sensing and communication structures. We compare extended Kalman filtering (EKF) and pose graph optimization (PGO) estimation using different robust loss functions (filtering and smoothing with varying batch sizes of sliding windows) in terms of estimation accuracy. In hardware experiments, two Turtlebot3 equipped with UWB modules are used for real-world inter-robot relative pose estimation, applying both EKF and PGO and comparing their performance.

estimation, relative pose estimation, robot, (12 more...)

arXiv.org Artificial Intelligence

2401.15313

Country:

North America > United States > Minnesota (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback